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Abstract 

In this paper, we develop a novel blind source separation (BSS) method for 
nonnegative and correlated data, particularly for the nearly degenerate data. 
The motivation lies in nuclear magnetic resonance (NMR) spectroscopy, where 
a multiple mixture NMR spectra are recorded to identify chemical compounds 
with similar structures (degeneracy). 

There have been a number of successful approaches for solving BSS problems 
by exploiting the nature of source signals. For instance, independent component 
analysis (ICA) is used to separate statistically independent (orthogonal) source 
signals. However, signal orthogonality is not guaranteed in many real- world 
problems. This new BSS method developed here deals with nonorthogonal sig- 
nals. The independence assumption is replaced by a condition which requires 
dominant interval(s) (DI) from each of source signals over others. Additionally, 
the mixing matrix is assumed to be nearly singular. The method first estimates 
the mixing matrix by exploiting geometry in data clustering. Due to the de- 
generacy of the data, a small deviation in the estimation may introduce errors 
(spurious peaks of negative values in most cases) in the output. To resolve 
this challenging problem and improve robustness of the separation, methods 
are developed in two aspects. One technique is to find a better estimation of 
the mixing matrix by allowing a constrained perturbation to the clustering out- 
put, and it can be achieved by a quadratic programming. The other is to seek 
sparse source signals by exploiting the DI condition, and it solves an i\ opti- 
mization. We present numerical results of NMR data to show the performance 
and reliability of the method in the applications arising in NMR spectroscopy. 
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1 Introduction 



Blind source separation (BSS) is a major area of research in signal and image pro- 
cessing. It aims at recovering source signals from their mixtures without detailed 
knowledge of the mixing process. Applications of BSS include signal analysis and 
processing of speech, image, and biomedical signals, especially, signal extraction, en- 
hancement, denoising, model reduction and classification problems [5]. The goal of 
this paper is to study new BSS methods for nearly degenerate data arising from Nu- 
clear Magnetic Resonance (NMR) spectroscopy. The BSS problem is defined by the 
following matrix model 



where X G R mxp ,A G R mKn ,S G R nxp . Rows of X represents the measured mixed 
signals, rows of S are the source signals. The X, S are sampled functions of an 
acquisition variable which may be time, frequency, position, wavenumber, etc, de- 
pending on the underlying physical process. Hence there are p samples in the mea- 
surements. The objective of BSS is to solve for A and S given X. In the con- 
text of NMR spectroscopy, the mixing coefficients are not typically measured. This 
is where BSS techniques become useful. The problem is also known as nonnega- 
tive matrix factorization (NMF [IE]). Similar to factorizing a composite number 
(48 = 6*8 = 8*6 = 4* 12 = 12 *4 = 2* 24 = 24 *2 = 3* 16 = 16 * 3), there are 
permutation and scaling ambiguities in solutions to BSS. For any permutation matrix 
P and invertible diagonal matrix A, (APA, A~ 1 P~ 1 S) is another pair equivalent to 
the solution (A,S), since 



Various BSS methods have been proposed relying on priori knowledge of source 
signals such as spatio-temporal decorrelation, statistical independence, sparseness, 
nonnegativity, etc, [3 El H21 CS1 HS1 E01 EH 123 EDI ED E21 E3] - Recently there have 
been considerable interests for solving nonnegative BSS problems, which emerge in 
computer tomography, biomedical image processing, NMR spectroscopy [21 [3J [HI [T71 
HH1 [231 [251 EH1 ET1 EH [291 [301 EH E21 [331 E5]. This work is originated from analytic 
chemistry, in particular, NMR spectroscopy. Applications include identification of 
organic compounds, metabolic fingerprinting, disease diagnosis, and drug design. As 
chemical mixtures abound in human organs, for example, blood, urine, and metabo- 
lites in brain and muscles. Each compound has a unique spectral fingerprint defined 
by the number, intensity and locations of its NMR peaks. In drug design, structural 
information must be isolated from spectra that also contain the target molecule, side 
products, and impurities. 

The different spectra come from Fourier transform of NMR measurement of ab- 
sorbance of radio frequency radiation by receptive nuclear spins of the same mixture 
sample at different time segments when exposed to high magnetic fields. The NMR 
spectra are nonnegative. Besides, NMR spectra of different chemical compounds are 
usually not independent, especially as compounds (component molecules) have similar 
functional groups, the peaks overlap in the composite NMR spectra making it difficult 
to identify the compounds involved. ICA-type approaches recover independent source 



X = AS , with Ay > , Sij > , 




X — AS — (APA)(A- 1 F~ 1 5). 
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signals and thus are unable to separate NMR source spectra. New methods need to 
be invented to handle this class of data. Recently nonnegative BSS has been attracted 
considerable attention in NMR spectroscopy [II [III [23 [2H1 Ell E21 ESI Ell EHl EZ] - 
For example, Naanaa and Nuzillard (NN) proposed a nonnegative BSS method in 
[25] based on a strict local sparseness assumption of the source signals. The NN as- 
sumption (NNA) requires the source signals to be strictly non-overlapping at some 
locations of acquisition variable (e.g., frequency). In other words, each source sig- 
nal must have a stand-alone peak where other sources are strictly zero there. Such a 
strict sparseness condition leads to a dramatic mathematical simplification of a general 
nonnegative matrix factorization problem (11. ip which is non-convex. Geometrically 
speaking, the problem of finding the mixing matrix A reduces to the identification 
of a minimal cone containing the columns of mixture matrix X. The latter can be 
achieved by linear programming. In fact, NN's sparseness assumption and the geo- 
metric construction of columns of A were known in the 1990's [21 [35] in the problem 
of blind hyper-spectral unmixing, where the same mathematical model (11.11) is used. 
The analogue of NN's assumption is called pixel purity assumption [6]. The resulting 
geometric (cone) method is the so called N-findr [35], and is now a benchmark in 
hyperspectral unmixing. NN's method can be viewed as an application of N-findr to 
NMR data. It is possible that measured NMR data may not strictly satisfy NN's 
sparseness conditions, which introduces spurious peaks in the results. Postprocessing 
methods have been developed to address the resulting errors. Such a study has been 
performed recently in case of (over)-determined mixtures [30] where it is found that 
larger peaks in the signals are more reliable and can be used to minimize errors due 
to lack of strict sparseness. 

In this paper, we consider how to separate the data if NN assumption is not satis- 
fied. We are concerned with the regime where source signals do not have stand-alone 
peaks yet one source signal dominates others over certain intervals of acquisition vari- 
able. In other words, a dominant interval(s) condition (DI) is required for source 
signals. This is a reasonable condition for many NMR spectra. For example, the DI 
condition holds well in the NMR data which motivated us. The data is produced 
by the so-called DOSY (diffusion ordered spectroscopy) experiment where a physical 
sample of mixed chemical compounds in solvent (water) is prepared. DOSY tries to 
distinguish the chemicals based on variation in their diffusion rates. However, DOSY 
fails to separate them if the compounds have similar chemical functional groups (i.e., 
they have similar diffusion rates). In this application, the diffusion rates of the chem- 
icals serve as the mixing coefficients. This presents an additional mathematical chal- 
lenge due to the near singularity of the mixing matrix. Separating these degenerate 
data is intractable to the convex cone methods, thus we are prompted to develop new 
approaches. Examination the DI condition reveals a great deal about the geometry of 
the mixtures. Actually, the scattered plot of columns of X must contain several clus- 
ters of points, and these clusters are centered at columns of A. Hence, the problem of 
finding A boils down to the identification of the clusters, and it can be accomplished 
by data clustering, for example, K-means. Although the data clustering in general 
produces a fairly good estimate of the mixing matrix, its output deviates from the 
true solution due to the presence of the noise, initial guess of the clustering algorithm, 
and so on. In the case of nearly singular mixing matrix, a small perturbation can 
lead to large errors in the source recovery (e.g., spurious peaks). To overcome this 
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difficulty and improve robustness of the separation, we propose two different meth- 
ods. One is to find a better estimation of mixing matrix by allowing a constrained 
perturbation to the clustering output, and it is achieved by a quadratic programming. 
The intention is to move the estimation closer to the true solution. The other is to 
seek sparse source signals by exploiting the DI condition. An l\ optimization problem 
is formulated for recovering the source signals. 

The paper is outlined as follows; In section 2, we shall review the essentials of 
NN approach, then we propose a new condition on the source signals motivated by 
NMR spectroscopy data. In section 3, we introduce the method. In section 4, we 
further illustrate our method with numerical examples including the processing of 
an experimental DOSY NMR data set. Section 5 is the conclusion. We shall use the 
following notations throughout the paper. The notation A> stands for the j-th column 
of matrix A, for the j-th column of matrix S, X^ the j-th column of matrix X. 
While Sj and Xj are the j-th rows of matrix S and X, or the j-th source and mixture, 
respectively. 

This work was partially supported by NSF-ADT grant DMS-0911277 and NSF 
grant DMS-0712881. The authors thank Professor A.J. Shaka and Dr. Hasan Celik 
for helpful discussions and their experimental NMR data. 

2 The method 

In the paper, we shall consider the determined case (m = n), although the results can 
be easily extended to the over-determined case (m > n). Consider the linear model 
(11. II) where each column in X represents data collected at a particular value of the 
acquisition variable, and each row represents a mixture spectrum. In this section, we 
shall first discuss the briefs of NN method, then introduce the new source conditions 
and the method. 

2.1 NN approach 

In [25], Naanaa and Nuzillard (NN) presented an efficient sparse BSS method and its 
mathematical analysis for nonnegative and partially orthogonal signals such as NMR 
spectra. Consider the (over)-determined regime where the number of mixtures is no 
less than that of sources (m > n), and the mixing matrix A is full rank. In simple 
terms, NN's key sparseness assumption (referred to as NNA below) on source signals 
is that each source has a stand-alone peak at some location of the acquisition variable 
where the other sources are identically zero. More precisely, the source matrix S > 
is assumed to satisfy the following condition 

Assumption (NNA). : For each i G {1,2, ... ,n} there exists an ji 6 {1,2, ... ,p} 
such that Sij. > and = (k = 1, . . . ,i — 1,« + 1, . . . ,n) . 

Eq. (II. ip can be rewritten in terms of columns as 

n 

Xi = J2sk,jA k j = l,...,p, (2.1) 

k=l 
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where denote the jth column of X, and A k the kth column of A. Assumption 
NNA implies that X j > = s^A 1 i = l,...,n or A i = — Hence Eq. (J2II]) is 
rewritten as 

n 

X j = , (2.2) 

which says that every column of X is a nonnegative linear combination of the columns 
of A. Here A = [X 71 , . . . , X 3n \ is the submatrix of X consisting of n columns each of 
which is collinear to a particular column of A. It should be noted that ji (i — 1, . . . , n) 
are not known and have to be computed. Once all the jiS are found, an estimation 
of the mixing matrix is obtained. The identification of A's columns is equivalent to 
identifying a convex cone of a finite collection of vectors. The cone encloses the data 
columns in matrix X, and is the smallest of such cones. Such a minimal enclosing 
convex cone can be found by linear programming methods. Mathematically, the 
following constrained equations are formulated for the identification of A, 

v 

x5 \ = xk , Aj > k = 1, . . . ,p . (2.3) 

Then any column X k will be a column of A if and only if the constrained equation ( 12. 3ft 
is inconsistent. However, if noises are present, the following optimization problems 
are suggested to estimate the mixing matrix 

v 

minimize score = || — X k \\ 2 ,k= 1, . . . ,p 

subject to Xj > . 

A score is associated with each column. A column with a low score is unlikely to be 
a column of A because this column is roughly a nonnegative linear combination of the 
other columns of X. On the other hand, a high score means that the corresponding 
column is far from being a nonnegative linear combination of other columns. Practi- 
cally, the n columns from X with highest scores are selected to form A, the mixing 
matrix. The Moore-Penrose inverse A + of A is then computed and an estimate to S 
is obtained: S = A + X. NN method proves to be both accurate and efficient if NNA 
condition holds. However, if the condition is not satisfied, errors and artifacts may be 
introduced because the true mixing matrix is no longer the smallest enclosing convex 
cone of columns of the data matrix. 

Recently, the authors have developed postprocessing techniques on how to improve 
NN results with abundance of mixture data, and how to improve mixing matrix esti- 
mation with major peak based corrections [30]. The work in [30] actually considered 
a relaxed NNA (rNNA) condition 

Assumption (rNNA). : For each i G {1,2, ... ,n} there exists an ji G {1,2, ... ,p} 
such that s i j i > and — e& (k = 1, . . . , i — 1, % + 1, . . . , n) , where -C s^. 

Simply said, each source signal has a dominant peak at acquisition position where 
the other sources are allowed to be nonzero. NNA condition recovers if all = 0. 
The rNNA is more realistic and robust than the ideal NNA for real-world NMR data 
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Figure 1: three source signals with dominant intervals (left panel); the geometry of 
the mixture matrix (right panel). The centers (red diamond) of three clusters are 
detected by k-means. 

2.2 Source assumption and mixing matrix 

Motivated by the DOSY NMR spectra, we propose here a different relaxed NN condi- 
tion on the source signals. Note that the rows Si, S2, ■ ■ ■ , S n of S are the source signals, 
and they are required to satisfy the following condition: For i — 1,2,3, ... ,n, source 
signal Si is required to have dominant interval(s) over S n , . . . , S i+ i, SV-i) . . . ,S 2 , Si, 
while Si is allowed to overlap with other signals at the rest of the acquisition region. 
More formally, it implies that source matrix S satisfies the following condition 

Assumption. For each k £ {1, 2, 3, ... , n}, there is a set Ik C {1, 2, . . . ,p} such that 
for each I £ l k s M > s jU j = 1, 2, . . . , k - 1, k + 1, . . . , n. 

We shall call this dominant interval condition, or DI condition. Fig. [1] is an 
idealized example of three DI source signals. In addition to the DI source condition, 
the mixing matrix is required to be near singular. The motivation is the similar 
diffusion rates of the chemicals with similar structure. This poses a mathematical 
challenge to invert a near singular matrix, since a small error in the recovered mixing 
matrix might lead to a considerable deviation in the source recovery. Among the 
singularly mixed signals (or degenerate data), in this paper we shall consider the 
following two types: 1) columns of the mixing matrix are parallel; 2) one column of the 
mixing matrix is a nonnegative linear combination of others. Case 1 is motivated by 
NMR of the chemicals with similar diffusion rate. We shall call this condition parallel 
column condition, or PCC. Case 2 can also be encountered in NMR spectroscopy of 
chemicals, and we shall call it one column degenerate condition, or OCDC. Please note 
that both PCC and OCDC should be considered to hold approximately in real-world 
data. 
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2.3 Our approach 



2.3.1 Data clustering 

Now suppose we have a set of nearly degenerate signals from DI sources. We require 
that compared to the size of dominant interval (s) in the acquisition region, the source 
signals overlapping region is much smaller. In fact, this is a reasonable assumption 
for the NMR data which motivates us. More importantly, this requirement enables 
the success of the clustering method. Next, we shall estimate the columns of mixing 
matrix A by data clustering. The dominant interval(s) from each of the source signals 
implies that there is a region where the source Si dominates others. More precisely, 
there are columns of X such that 



where dominate 0^(2 = 1, . . . , n — 1), i.e., 3> Oj^. The identification of A 1 (i = 
1, . . . , n) is equivalent to finding a cluster formed by these X k, s in W 1 . As illustrated 
in the geometry plot of X in Fig. (TJ three clusters are formed. Many clustering 
techniques are available for locating these clusters, for example, k-means is one of 
the simplest unsupervised learning algorithms that solve the well known clustering 
problem. We shall use k-means analysis in this paper because it is computationally 
fast, and easy to implement. Consider an example of three DI source signals with 
OCDC mixing matrix condition, the three centers are shown in Fig. [TJ For real-world 
data, we show an example of NMR spectra of quinine, geraniol, and camphor mixture 
in Fig. |2J The clusters in the middle implies that OCDC condition hold well for 
this data. Apparently, NN method (and other convex cone methods) would fail to 
separate the source signals due to the degeneracy of the mixing matrix. It might be 
able to identify two columns of A as the two edges, it by no means can locate A's 
degenerate column. For the PCC degenerate case, clustering is also able to deliver a 
good estimation, even when the data is contaminated by noise. We show the results 
in Fig. [3] where the three clusters are very close due to the PCC degeneracy. NN 
solution would deviate considerably from the true solution. For the data we tested, 
clustering techniques like k-means works well when the condition number of the mixing 
matrix is up to 10 8 . Though the solutions of mixing matrix by clustering methods are 
rather good estimation to the true solution, small deviations from the true ones will 
introduce large errors in the source recovery (S = inverse(A) X). Next we propose two 
approaches to overcome this difficulty. Both approaches need to solve optimization 
problems. The first one intends to improve the source recovery by seeking a better 
mixing matrix, while the second approach reduces the spurious peaks by imposing 
sparsity constraint on the sources. 

2.3.2 Better inverse of the mixing matrix 

Suppose the estimation of the mixing matrix by clustering is A. Then the source 
recovery can be obtained S = inverse (A) X. As discussed above, errors in S could be 
introduced even by a small perturbation in A to the ground truth. Negative spurious 
peaks are produced in most cases, see the Fig. where the negative peaks on the left 
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(2.4) 
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Figure 2: Real data example: three columns of A are identified as the three center 
points (in red diamond) attracting most points in scatter plots of the columns of X 
(left), and the three rows of X (right). NN method identifies two columns of A as the 
points in the blue circle. 




Figure 3: Example of PCC case: three columns of A are identified as the three center 
points (in red diamond) attracting most points in scatter plots of the columns of X 
(left), and the three rows of X (right). 



7 



plot actually can be viewed as bleed through from another source. Clearly, a better 
estimation of mixing matrix is required to reduce these spurious peaks. Instead of 
looking for a better mixing matrix, we propose to solve the following optimization 
problem for a better inverse of the matrix, 

min-||J- AB\\\ subject to B X > , (2.5) 
b 2 

where / G M mxm is the identity matrix. The constraint BX > is used to reduce 
the negative values introduced in the source recovery. (12. 5p is a linearly constrained 
quadratic program and it can be solved by a variety of methods including interior 
point, gradient projection, active sets, etc. In this paper, interior point algorithm is 
used. Once the minimizer B* is obtained, we solve for the sources by S = B* X. 

2.3.3 Sparser source signals 

The method proposed above works well for mixing matrix whose condition number 
is up to 10 8 . If the mixing matrix is much more ill-conditioned, the problem fll.ip 
becomes under-determined. It appears that solving the equation exactly for S is 
hopeless even an accurate A is provided. However, a meaningful solution is possible if 
the actual source signals are structurally compressible, meaning that they essentially 
depend on a low number of degrees of freedom. Although the source signals (rows of 
S) are not sparse, the columns of S possess sparsity due to the dominant intervals 
condition. Hence, we seek the sparsest solution for each column S l of S as 

min H^llo subject to AS i = X\ S i > 0. (2.6) 

Here || • 1 1 ( 0-norm ) represents the number of nonzeros. Because of the non-convexity 
of the 0-norm, we minimize the £i-norm: 

min US* ||i subject to AS* = X\ S* > 0, (2.7) 

which is a linear program [11] because S l is non-negative. The fact that data may in 
general contain noise suggest solving the following unconstrained optimization prob- 
lem, 

mm^\\S l \\ 1 + -\\X l - A&Wl, (2.8) 

5*>0 2 

for which Bregman iterative method [151 EH] with a proper projection onto non- 
negative convex subset can be used to obtain a solution. Under certain conditions of 
matrix A, it is known [5J EH] that solution of ^-minimization (12. 8p gives the exact re- 
covery of sufficiently sparse signal, or solution to ( 12. 5J) , [51 [39] • Though our numerical 
results support the equivalence of £\ and £o minimizations, the mixing matrix A does 
not satisfy the existing sufficient conditions [5| [39]. 

3 Numerical experiments 

In this section, we report the numerical examples solved by the method. We compute 
three examples. The data of the first two examples are synthetic, while the third 



example uses real NMR data. In the first example, two sources are to be separated 
from two mixtures. The mixtures are constructed from two real NMR source signal 
by simulating the linear model ( II. ip . The two columns of mixing matrix are nearly 
parallel, and its condition number is about 1.25 x 10 8 . The true mixing matrix A, its 
estimation A via clustering, and the improved estimate A v by solving (12.51) are (for 
ease of comparison, the first rows of A, A v are scaled to be same as that of A) 



A 



A 
A 



v 



0.894427190999916 0.894427182055644 

0.447213595499958 0.447213613388501 

0.894427190999916 0.894427182055644 

0.44 72 13596792237 0.44 72 13596447341 

0.894427190999916 0.894427182055644 

0.447213595582167 0.447213613388502 



Clearly A v is a better estimate. The mixtures are plotted in Fig. HJ and the results 
are presented in Fig. [51 

In the second example, three sources are to be separated from three mixtures. The 
mixing matrix satisfies the OCDC condition, i.e., one of its columns is a nonnegative 
linear combination of the other two. To test the robustness of the method, we added 
Gaussian noise (SNR = 60 dB) to the data. The mixtures and their geometric struc- 
ture are plotted in Fig. [61 First the data clustering was used to obtain an estimation 
of the mixing matrix, then an £ 1 optimization problem is solved to retrieve the sources. 
The results are shown in Fig. [71 It can be seen that the recovered sources agree well 
with the ground truth. 

For the third example, we provide a set of real data to test our method. The 
data is produced by diffusion ordered spectroscopy (DOSY) which is an NMR spec- 
troscopy technique used by chemists for mixture separation [22]. However, the three 
compounds used in the experiment (quinine, geraniol, and camphor) have similar 
chemical functional groups (i.e. there is overlapping in their NMR spectra) [23], for 
which DOSY fails to separate them. It is known that each of the three sources has 
dominant interval (s) over others in its NMR spectrum. This can also be verified from 
the three isolated clusters formed in their mixed NMR spectra (see the geometry of 
their mixtures in Fig. [H]). Here we separate three sources from three mixtures. Fig. 
[8] plots the mixtures (rows of X) and their geometry (columns of X) where three 
clusters of points can be spotted. Then the columns of A are identified as the cen- 
ter points of three clusters. The solutions are presented in Fig. [91 the results are 
satisfactory comparing with the ground truth. As a comparison, the source signals 
recovered by NN [25] is shown in Fig. [TU] where S = inverse (A) X, here the inverse is 
Moore- Penrose (the least squares sense) pseudo-inverse which produces some negative 
(erroneous) peaks in S. 



4 Conclusion 

This paper presented novel methods to retrieve source signals from the nearly degener- 
ate mixtures. The motivation comes from NMR spectroscopy of chemical compounds 
with similar diffusion rates. Inspired by the NMR structure of these chemicals, we 
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Figure 4: recovered sources by clustering (left column) and the ground truth (right 
column) . 
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Figure 5: recovered sources by clustering (left column), the ground truth (middle col- 
umn), and the improved results by a better estimate of mixing matrix (right column). 
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Figure 7: three sources (left column) and their recovery by clustering and t\ mini- 
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Figure 8: three columns of A are identified as the three center points in blue circles 
attracting most points in scatter plots of the columns of X (left), and the three rows 
of X (right). 
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Figure 9: the recovered source signals by nonnegative l\ (left) and the ground truth 
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Figure 10: the recovered source signals using NN method (left) and the ground truth 
(right). 



propose a viable source condition which requires dominant interval(s) from each source 
signal over the others. This condition is well suited for many real-life signals. Be- 
sides, the nearly degenerate mixtures are assumed to be generated from the following 
two types of mixing processes: 1) all the columns of the mixing matrix are paral- 
lel; 2) One column of the mixing matrix is the nonnegative linear combination of 
others. We first use data clustering to identify the mixing matrix, then we develop 
two approaches to improve source signals' recovery. The first approach minimizes a 
constrained quadratic program for a better mixing matrix, while the second method 
seeks the sparsest solution for each column of the source matrix by solving an l\ op- 
timization. Numerical results on NMR spectra data show satisfactory performance of 
our method and offer promise towards understanding and detecting complex chemical 
spectra. Though the methods are motivated by the NMR spectroscopy, the underlying 
ideas may be generalized to different data sets in other applications. 
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